NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

Wu, Shanshan; Dimakis, Alexandros; Sanghavi, Sujay; Yu, Felix; Holtmann-Rice, Daniel; Storcheus, Dmitry; Rostamizadeh, Afshin; Kumar, Sanjiv (June 2019, International Conference on Machine Learning (ICML))

Linear encoding of sparse vectors is widely popular, but is commonly data-independent – missing any possible extra (but a priori unknown) structure beyond sparsity. In this paper we present a new method to learn linear encoders that adapt to data, while still performing well with the widely used l1 decoder. The convex l1 decoder prevents gradient propagation as needed in standard gradient-based training. Our method is based on the insight that unrolling the convex decoder into T projected subgradient steps can address this issue. Our method can be seen as a data-driven way to learn a compressed sensing measurement matrix. We compare the empirical performance of 10 algorithms over 6 sparse datasets (3 synthetic and 3 real). Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1.1-3x) compared to the previous state-of-the-art methods. We illustrate an application of our method in learning label embeddings for extreme multi-label classification, and empirically show that our method is able to match or outperform the precision scores of SLEEC, which is one of the state-of-the-art embedding-based approaches.
more » « less
Full Text Available
UniProt: the Universal Protein Knowledgebase in 2023

https://doi.org/10.1093/nar/gkac1052

Bateman, Alex; Martin, Maria-Jesus; Orchard, Sandra; Magrane, Michele; Ahmad, Shadab; Alpi, Emanuele; Bowler-Barnett, Emily H; Britto, Ramona; Bye-A-Jee, Hema; Cukura, Austra; et al (November 2022, Nucleic Acids Research)

Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users’ experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
more » « less
Full Text Available
UniProt: the universal protein knowledgebase in 2021

https://doi.org/10.1093/nar/gkaa1100

Bateman, Alex; Martin, Maria-Jesus; Orchard, Sandra; Magrane, Michele; Agivetova, Rahat; Ahmad, Shadab; Alpi, Emanuele; Bowler-Barnett, Emily H; Britto, Ramona; Bursteinas, Borisas; et al (November 2020, Nucleic Acids Research)

Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
more » « less
Full Text Available

Search for: All records